Interpretable and Explainable AI

Explaining Machine Learning Models

Mateusz Cedro

Table of content:

  1. Importing Packages
  2. Data
  3. XGBoost Model Training
  4. SHAP Explanations
  5. Variable Importance Differentiation
  6. Positive and Negative Attributions For The Same Variable Example
  7. From XAI Explanations to Insights
  8. Comparision of XGBoost to SVM Models
  9. Playaround with Vizualization

Brain Stroke dataset - https://www.kaggle.com/datasets/jillanisofttech/brain-stroke-dataset/

0. Import Packages

1. Data

2. XGBoost Model Training

Predictions for two chosen observations

3. SHAP Explanations

Next, for the same observations, I have calculated the decomposition of predictions, so-called variable attributions, using SHAP

For the first patient probability of having a stroke is mostly affected by the heart_disease=1 and age=67 factors. Factors which has the least impact on the final probability predictions are residence_type_urban=1 and having no hypertension. The final probability of having a stroke for that patient equals 70%.

For the second patient probability of having a stroke is most affected by the age=80 factor. However, smoking_status = never smoked to some extent decreases the probability of having a stroke. Factors which has the least impact on the final probability predictions are having no hypertension and residence_type_urban=1. The final probability of having a stroke for that patient equals 53%.

Above break down plots for the observations 0 and 1 are shown.

4. Variable Importance Differentiation

Find any two observations in the dataset, such that they have different variables of the highest importance.

I have found, that bobservations 1 and 2 have different variables of highest importance - for the observation 1 the two most importsnt variables are age and smoking status, whereas for the observation2 it is avg_glucose_level and private work type.

5. Positive and negative attributions for the same variable

Observations 2 and 12 has different attribution values for the BMI index: Observationstion 2 has a negative attribution, whereas the observation 12 has a positive attribution for BMI index.

6. From XAI Explanations to Insights

Explaining predictions

Global SHAP for a tree-based mode

7. Comparision of XGBoost and SVM Models

Train the SVM mode

Explain predictions with KernelSHAP

Compare explanations of XGBoost and SVM models

Here we can see huge differences between the XGBoost and SVM models in creating the explanations for first two observations.

8. Playaround with Vizualization

The plot above is interactive - feel free to play with it!

Mateusz Cedro -XAI on brain stroke dataset